Proceedings of First International Symposium on Sanskrit Computational Linguistics
نویسندگان
چکیده
The most authoritative description of the morphophonemic rules that apply at word boundaries (external sandhi) in Sanskrit is by the great grammarian Pān. ini (fl. 5th c. B. C. E.). These rules are stated formally in Pān. ini’s grammar, the As. t .ādhyāyı̄ ‘group of eight chapters’. The present paper summarizes Pān. ini’s handling of sandhi, his notational conventions, and formal properties of his theory. An XML vocabulary for expressing Pān. ini’s morphophonemic rules is then introduced, in which his rules for sandhi have been expressed. Although Pān. ini’s notation potentially exceeds a finite state grammar in power, individual rules do not rewrite their own output, and thus they may be automatically translated into a rule cascade from which a finite state transducer can be compiled. 1. SANDHI IN SANSKRIT Sanskrit possesses a set of morphophonemic rules (both obligatory and optional) that apply at morpheme and word boundaries (the latter are also termed pada boundaries). The former are called internal sandhi (< sam. dhi ‘putting together’); the latter, external sandhi. This paper only considers external sandhi. Sandhi rules involve processes such as assimilation and vowel coalescence. Some examples of external sandhi are: na asti > nāsti ‘is not’, tat ca > tac ca ‘and this’, etat hi > etad dhi ‘for this’, devas api > devo ’pi ‘also a god’. This work has been supported by NSF grant IIS-0535207. Any opinions, findings, and conclusions or recommendations expressed are those of the author and do not necessarily reflect the views of the National Science Foundation. The paper has benefited from comments by Peter M. Scharf and by four anonymous referees. The symbol 〈’〉 (avagraha) does not represent a phoneme but is an orthographic convention to indicate the prodelision of an initial a-. 2. SANDHI IN Pān. ini’S GRAMMAR Pān. ini’s As. t .ādhyāyı̄ is a complete grammar of Sanskrit, covering phonology, morphology, syntax, semantics, and even pragmatics. It contains about 4000 rules (termed sūtra, literally ‘thread’), divided between eight chapters (termed adhyāya). Conciseness (lāghava) is a fundamental principle in Pān. ini’s formulation of carefully interrelated rules (Smith, 1992). Rules are either operational (i. e. they specify a particular linguistic operation, or kārya) or interpretive (i. e. they define the scope of operational rules). Rules may be either obligatory or optional. A brief review of some well-known aspects of Pān. ini’s grammar is in order. The operational rules relevant to sandhi specify that a substituend (sthānin) is replaced by a substituens (ādeśa) in a given context (Cardona, 1965b, 308). Rules are written using metalinguistic case conventions, so that the substituend is marked as genitive, the substituens as nominative, the left context as ablative (tasmāt), and the right context as locative (tasmin). For instance: 8.4.62 jhayo ho ’nyatarasyām jhaY-ABL h-GEN optionally This rule specifies that (optionally) a homogenous sound replaces h when preceded by a sound termed jhaY — i. e. an oral stop (Sharma, 2003, 783–784). Pān. ini uses abbreviatory labels (termed pratyāhāra) to describe phonological classes. These labels are interpreted in the context of an ancillary text of the The traditional classification of rules is more fine-grained and comprises sam. jñā (technical terms), paribhās. ā (interpretive rules), vidhi (operational rules), niyama (restriction rules), pratis. edha (negation rules), atideśa (extension rules), vibhās. ā (optional rules), nipātana (ad hoc rules), adhikāra (heading rules) (Sharma, 1987, 89). Proc. of FISSCL, Paris, October 29-31, 2007 As. t .ādhyāyı̄, the Śivasūtras, which enumerate a catalog of sounds (varn. asamāmnāya) in fourteen classes (Cardona, 1969, 6):
منابع مشابه
Sanskrit Computational Linguistics - 4th International Symposium, New Delhi, India, December 10-12, 2010. Proceedings
Why should wait for some days to get or receive the sanskrit computational linguistics 4th international symposium new delhi india december 10 12 2010 proceedings lecture notes in computer science lecture notes in artificial intelligence book that you order? Why should you take it if you can get the faster one? You can find the same book that you order right here. This is it the book that you c...
متن کاملSanskrit Linguistics Web Services
We propose to demonstrate a collection of tools for Sanskrit Computational Linguistics developed by cooperating teams in the general setting of Web services. These services offer a systematic architecture integrating multilingual lexicons, morphological generation and analysis, segmentation and parsing, and interlink with the Sanskrit Library digital repository. They may be used as distributed ...
متن کاملA Collaborative Platform for Sanskrit Processing
Sanskrit, the classical language of India, presents specific challenges for computational linguistics: exact phonetic transcription in writing that obscures word boundaries, rich morphology and an enormous corpus, among others. Recent international cooperation has developed innovative solutions to these problems and significant resources for linguistic research. Solutions include efficient segm...
متن کاملDesign Of A Lexical Database For Sanskrit
We present the architectural design rationale of a Sanskrit computational linguistics platform, where the lexical database has a central role. We explain the structuring requirements issued from the interlinking of grammatical tools through its hypertext rendition.
متن کاملLexicon-directed Segmentation and Tagging of Sanskrit
We propose a methodology for Sanskrit processing by computer. The first layer of this software, which analyses the linear structure of a Sanskrit sentence as a set of possible interpretations under sandhi analysis, is operational. Each interpretation proposes a segmentation of the sentence as a list of tagged segments. The method, which is lexicon directed, is complete if the given (stem forms)...
متن کاملAn Effort to Develop a Tagged Lexical Resource for Sanskrit
In this paper we present our efforts the first time of its kind in the history of Sanskrit to design and develop a structured electronic lexical Resource by tagging a Traditional Sanskrit dictionary. We narrate how the whole unstructured raw text of Vaacaspatyam – an encyclopedic type of Sanskrit Dictionary has been tagged to form a user friendly e-lexicon with structured and segregated informa...
متن کامل